Search CORE

32 research outputs found

RealKrimp - Finding Hyperintervals that Compress with MDL for Real-Valued Data

Author: Duijvestein W.
Grünwald P.D. (Peter)
Knobbe A.J. (Arno)
Witteveen J.E.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2014
Field of study

Crossref

CWI's Institutional Repository

Ghent University Academic Bibliography

International Migration, Integration and Social Cohesion online publications

Effects of pacing properties on performance in long-distance running

Author: Knobbe A.J.
Leeuw A. de
Meerhoff L.A.
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 13/11/2018
Field of study

This article focuses on the performance of runners in official races. Based on extensive public data from participants of races organized by the Boston Athletic Association, we demonstrate how different pacing profiles can affect the performance in a race. An athlete's pacing profile refers to the running speed at various stages of the race. We aim to provide practical, data-driven advice for professional as well as recreational runners. Our data collection covers 3 years of data made public by the race organizers, and primarily concerns the times at various intermediate points, giving an indication of the speed profile of the individual runner. We consider the 10 km, half marathon, and full marathon, leading to a data set of 120,472 race results. Although these data were not primarily recorded for scientific analysis, we demonstrate that valuable information can be gleaned from these substantial data about the right way to approach a running challenge. In this article, we focus on the role of race distance, gender, age, and the pacing profile. Since age is a crucial but complex determinant of performance, we first model the age effect in a gender- and distance-specific manner. We consider polynomials of high degree and use cross-validation to select models that are both accurate and of sufficient generalizability. After that, we perform clustering of the race profiles to identify the dominant pacing profiles that runners select. Finally, after having compensated for age influences, we apply a descriptive pattern mining approach to select reliable and informative aspects of pacing that most determine an optimal performance. The mining paradigm produces relatively simple and readable patterns, such that both professionals and amateurs can use the results to their benefit.Algorithms and the Foundations of Software technolog

Leiden University Scholary Publications

Multi-relational data mining

Author: Blockeel H.
Knobbe A.J. (Arno)
Siebes A.P.J.M. (Arno)
Wallen D.M.G. van der
Publication venue: CWI
Publication date: 01/01/1999
Field of study

An important aspect of data mining algorithms and systems is that they should scale well to large databases. A consequence of this is that most data mining tools are based on machine learning algorithms that work on data in attribute-value format. Experience has proven that such 'single-table' mining algorithms indeed scale well. The downside of this format is, however, that more complex patterns are simply not expressible in this format and, thus, cannot be discovered. One way to enlarge the expressiveness is to generalize, as in ILP, from one-table mining to multiple table mining, i.e., to support mining on full relational databases. The key step in such a generalization is to ensure that the search space does not explode and that efficiency and, thus, scalability are maintained. In this paper we present a framework and an architecture that provide such a generalization. In this framework the semantic information in the database schema, e.g., foreign keys, are exploited to prune the search space and, in the architecture, database primitives are defined to ensure efficiency. Moreover, the framework induces a canonical generalization of algorithms, i.e., if the generalized algorithms are run on a single table database, they give the same results as their single-table counterparts. The framework is illustrated by the Warmr algorithm, which is a multi-relational generalization of the Apriori algorithm

CWI's Institutional Repository

Predefined pattern detection in large time series

Author: Miao S. Vespier U., De Gouveia da Costa Cachucho R.E., Meeng M., Knobbe, A.J.
Publication venue
Publication date: 01/01/2016
Field of study

Predefined pattern detection from time series is an interesting and challenging task. In order to reduce its computational cost and increase effectiveness, a number of time series representation methods and similarity measures have been proposed. Most of the existing methods focus on full sequence matching, that is, sequences with clearly defined beginnings and endings, where all data points contribute to the match. These methods, however, do not account for temporal and magnitude deformations in the data and result to be ineffective on several real-world scenarios where noise and external phenomena introduce diversity in the class of patterns to be matched. In this paper, we present a novel pattern detection method, which is based on the notions of templates, landmarks, constraints and trust regions. We employ the Minimum Description Length (MDL) principle for time series preprocessing step, which helps to preserve all the prominent features and prevents the template from overfitting. Templates are provided by common users or domain experts, and represent interesting patterns we want to detect from time series. Instead of utilising templates to match all the potential subsequences in the time series, we translate the time series and templates into landmark sequences, and detect patterns from landmark sequence of the time series. Through defining constraints within the template landmark sequence, we effectively extract all the landmark subsequences from the time series landmark sequence, and obtain a number of landmark segments (time series subsequences or instances). We model each landmark segment through scaling the template in both temporal and magnitude dimensions. To suppress the influence of noise, we introduce the concept oftrust region, which not only helps to achieve an improved instance model, but also helps to catch the accurate boundaries of instances of the given template. Based on the similarities derived from instance models, we introduce the probability density function to calculate a similarity threshold. The threshold can be used to judge if a landmark segment is a true instance of the given template or not. To evaluate the effectiveness and efficiency of the proposed method, we apply it to two real-world datasets. The results show that our method is capable of detecting patterns of temporal and magnitude deformations with competitive performance

Leiden University Scholary Publications

Modeling match performance in elite volleyball players: importance of jump load and strength training characteristics

Author: Baar R. van.
Knobbe A.J.
Leeuw A.W. de
Zwaard S. van der
Publication venue: 'MDPI AG'
Publication date: 20/10/2022
Field of study

In this study, we investigated the relationships between training load, perceived wellness and match performance in professional volleyball by applying the machine learning techniques XGBoost, random forest regression and subgroup discovery. Physical load data were obtained by manually logging all physical activities and using wearable sensors. Daily wellness of players was monitored using questionnaires. Match performance was derived from annotated actions by a video scout during matches. We identified conditions of predictor variables that related to attack and pass performance (p < 0.05). Better attack performance is related to heavy weights of lower-body strength training exercises in the preceding four weeks. However, worse attack performance is linked to large variations in weights of full-body strength training exercises, excessively heavy upper-body strength training, low jump heights and small variations in the number of high jumps in the four weeks prior to competition. Lower passing performance was associated with small variations in the number of high jumps in the preceding week and an excessive amount of high jumps performed, on average, in the two weeks prior to competition. Differences in findings with respect to passing and attack performance suggest that elite volleyball players can improve their performance if training schedules are adapted to the position of a player.Algorithms and the Foundations of Software technolog

Leiden University Scholary Publications

Anomaly detection in urban drainage with stereovision

Author: Bäck T. (Thomas)
Knobbe A.J. (Arno)
Luimes R. (Rianne)
Meijer D.W.J. (Dirk)
Publication venue: 'Elsevier BV'
Publication date: 01/07/2022
Field of study

This work introduces RADIUS, a framework for anomaly detection in sewer pipes using stereovision. The framework employs three-dimensional geometry reconstruction from stereo vision, followed by statistical modeling of the geometry with a generic pipe model. The framework is designed to be compatible with existing workflows for sewer pipe defect detection, as well as to provide opportunities for machine learning implementations in the future. We test the framework on 48 image sets of 26 sewer pipes in different conditions collected in the lab. Of these 48 image sets, 5 could not be properly reconstructed in three dimensions due to insufficient stereo matching. The surface fitting and anomaly detection performed well: a human-graded defect severity score had a moderate, positive Pearson correlation of 0.65 with our calculated anomaly scores, making this a promising approach to automated defect detection in urban drainage

CWI's Institutional Repository

Pure OAI Repository

Leiden University Scholary Publications

Personalized machine learning approach to injury monitoring in elite volleyball players

Author: Baar R. van
Knobbe A.J.
Leeuw A.W. de
Zwaard S. van der
Publication venue: 'Informa UK Limited'
Publication date: 25/02/2021
Field of study

LIACS-Managemen

Leiden University Scholary Publications

A recurrent neural network architecture to model physical activity energy expenditure in older people

Author: Beekman M.
Knobbe A.J.
Okai J.
Paraschiakos S.
Slagboom P.E.
Sá C.R. de
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/01/2022
Field of study

Through the quantification of physical activity energy expenditure (PAEE), health care monitoring has the potential to stimulate vital and healthy ageing, inducing behavioural changes in older people and linking these to personal health gains. To be able to measure PAEE in a health care perspective, methods from wearable accelerometers have been developed, however, mainly targeted towards younger people. Since elderly subjects differ in energy requirements and range of physical activities, the current models may not be suitable for estimating PAEE among the elderly. Furthermore, currently available methods seem to be either simple but non-generalizable or require elaborate (manual) feature construction steps. Because past activities influence present PAEE, we propose a modeling approach known for its ability to model sequential data, the recurrent neural network (RNN). To train the RNN for an elderly population, we used the growing old together validation (GOTOV) dataset with 34 healthy participants of 60 years and older (mean 65 years old), performing 16 different activities. We used accelerometers placed on wrist and ankle, and measurements of energy counts by means of indirect calorimetry. After optimization, we propose an architecture consisting of an RNN with 3 GRU layers and a feedforward network combining both accelerometer and participant-level data. Our efforts included switching mean to standard deviation for down-sampling the input data and combining temporal and static data (person-specific details such as age, weight, BMI). The resulting architecture produces accurate PAEE estimations while decreasing training input and time by a factor of 10. Subsequently, compared to the state-of-the-art, it is capable to integrate longer activity data which lead to more accurate estimations of low intensity activities EE. It can thus be employed to investigate associations of PAEE with vitality parameters of older people related to metabolic and cognitive health and mental well-being.Algorithms and the Foundations of Software technolog

Leiden University Scholary Publications

Alone at the Playground

Author: Cachucho R.
De Leng W.
Ketelaar L.
Knobbe A.J.
Kok J.N.
Neto C.
Rieffe C.
Veiga G.
Publication venue: 'Informa UK Limited'
Publication date: 17/02/2016
Field of study

Algorithms and the Foundations of Software technolog

Leiden University Scholary Publications

Repositório Científico da Universidade de Évora

Unlocking the potential of big data to support tactical performance analysis in professional soccer: A systematic review

Author: Brink M.S.
Bueno M.J.O.
Cunha S.A.
Elferink-Gemser M.T.
Goes F.R.
Knobbe A.J.
Lemmink K.A.P.M.
Meerhoff L.A.
Moura F.A.
Rodrigues D.M.
Torres R.S.
Publication venue: 'Informa UK Limited'
Publication date: 16/04/2020
Field of study

In professional soccer, increasing amounts of data are collected that harness great potential when it comes to analysing tactical behaviour. Unlocking this potential is difficult as big data challenges the data management and analytics methods commonly employed in sports. By joining forces with computer science, solutions to these challenges could be achieved, helping sports science to find new insights, as is happening in other scientific domains. We aim to bring multiple domains together in the context of analysing tactical behaviour in soccer using position tracking data. A systematic literature search for studies employing position tracking data to study tactical behaviour in soccer was conducted in seven electronic databases, resulting in 2338 identified studies and finally the inclusion of 73 papers. Each domain clearly contributes to the analysis of tactical behaviour, albeit in - sometimes radically - different ways. Accordingly, we present a multidisciplinary framework where each domain's contributions to feature construction, modelling and interpretation can be situated. We discuss a set of key challenges concerning the data analytics process, specifically feature construction, spatial and temporal aggregation. Moreover, we discuss how these challenges could be resolved through multidisciplinary collaboration, which is pivotal in unlocking the potential of position tracking data in sports analytics.Algorithms and the Foundations of Software technolog

Leiden University Scholary Publications